9 research outputs found

    Explicit diversification of event aspects for temporal summarization

    Get PDF
    During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness

    Diversity and novelty in information retrieval

    Get PDF
    This tutorial aims to provide a unifying account of current research on diversity and novelty in different IR domains, namely, in the context of search engines, recommender systems, and data streams

    Explicit web search result diversification

    No full text

    Learning to rank query suggestions for adhoc and diversity search

    No full text
    Query suggestions have become pervasive in modern web search, as a mechanism to guide users towards a better representation of their information need. In this article, we propose a ranking approach for producing effective query suggestions. In particular, we devise a structured representation of candidate suggestions mined from a query log that leverages evidence from other queries with a common session or a common click. This enriched representation not only helps overcome data sparsity for long-tail queries, but also leads to multiple ranking criteria, which we integrate as features for learning to rank query suggestions. To validate our approach, we build upon existing efforts for web search evaluation and propose a novel framework for the quantitative assessment of query suggestion effectiveness. Thorough experiments using publicly available data from the TREC Web track show that our approach provides effective suggestions for adhoc and diversity search

    About learning models with multiple query dependent features

    No full text
    Several questions remain unanswered by the existing literature concerning the deployment of query dependent features within learning to rank. In this work, we investigate three research questions to empirically ascertain best practices for learning to rank deployments: (i) Previous work in data fusion that pre-dates learning to rank showed that while diļ¬€erent retrieval systems could be eļ¬€ectively combined, the combination of multiple models within the same system was not as eļ¬€ective. In contrast, the existing learning to rank datasets (e.g. LETOR), often deploy multiple weighting models as query dependent features within a single system, raising the question as to whether such combination is needed. (ii) Next, we investigate whether the training of weighting model parameters, traditionally required for eļ¬€ective retrieval, is necessary within a learning to rank context. (iii) Finally, we note that existing learning to rank datasets use weighting model features calculated on diļ¬€erent ļ¬elds (e.g. title, content or anchor text), even though such weighting models have been criticised in the literature. Experiments to address these three questions are conducted on Web search datasets, using various weighting models as query dependent, and typical query independent features, which are combined using three learning to rank techniques. In particular, we show and explain why multiple weighting models should be deployed as features. Moreover, we unexpectedly ļ¬nd that training the weighting modelā€™s parameters degrades learned models eļ¬€ectiveness. Finally, we show that computing a weighting model separately for each ļ¬eld is less eļ¬€ective than more theoretically-sound ļ¬eld-based weighting models

    Modelling efficient novelty-based search result diversification in metric spaces

    Get PDF
    AbstractNovelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) documentā€“document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by documentā€“document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentŹ¼s relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency

    Information retrieval on the blogosphere

    No full text
    Blogs have recently emerged as a new open, rapidly evolving and reactive publishing medium on the Web. Rather than managed by a central entity, the content on the blogosphere ā€” the collection of all blogs on the Web ā€” is produced by millions of independent bloggers, who can write about virtually anything. This open publishing paradigm has led to a growing mass of user-generated content on theWeb, which can vary tremendously both in format and quality when looked at in isolation, but which can also reveal interesting patterns when observed in aggregation. One field particularly interested in studying how information is produced, consumed, and searched in the blogosphere is information retrieval. In this survey, we review the published literature on searching the blogosphere. In particular, we describe the phenomenon of blogging and the motivations for searching for information on blogs. We cover both the search tasks underlying blog searchers' information needs and the most successful approaches to these tasks. These include blog post and full blog search tasks, as well as blog-aided search tasks, such as trend and market analysis. Finally, we also describe the publicly available resources that support research on searching the blogosphere
    corecore